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Abstract 

The Metropolis implementation of the Monte Carlo algorithm has been developed to study the 
equilibrium thermodynamics of many-body systems. Choosing small trial moves, the trajectories 
obtained applying this algorithm agree with those obtained by Langevin's dynamics. Applying this 
procedure to a simplified protein model, it is possible to show that setting a threshold of 1° on the 
movement of the dihedrals of the protein backbone in a single Monte Carlo step, the mean quantities 
associated with the off-equilibrium dynamics (e.g., energy, RMSD, etc.) are well reproduced, while 
the good description of higher moments requires smaller moves. An important result is that the 
time duration of a Monte Carlo step depends linearly on the temperature, something which should 
be accounted for when doing simulations at different temperatures. 
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I. INTRODUCTION 



Since its publication by Metropolis, Rosenbluth, Teller and their wives in 1953, the 
algorithm designed for "equation of state calculations by fast computing machines" [l| has 
been used to obtain an approximation of the equilibrium properties of a wide range of 
classical systems. Of course, in fifty years, fast computing machines has become faster and 
faster, and this decreed the success of the algorithm. 

The Metropolis algorithm performs a sample of the configuration space of a system start- 
ing from a random conformation and repeating a large number of steps. Each step consists 
of attempting a transition to a new conformation x' choosing among a set of allowed moves, 
and accepting the attempt with probability min[l, exp(— (f/(s) — f/(a;'))/T)] where \J is the 
potential energy and T the absolute temperature in units of Boltzmann's constant. This is 
equivalent to solve numerically the master equation 

— — = j dx \p{x\t)w{x' ^ x) — p{x,t)w{x ^ x')] . (1) 

where the transition rates are 

U{x) - U{x') 



W[X ^ X) = Wo-pap[X 



X ■ mm 



1, exp 



T 



(2) 



where wq sets the time scale of the transitions and Pap{x' x) is the a priori probability 
of choosing the move which goes from the state x' to the state x. If the a priori probability 
satisfies Pap{x' x) = Pap{x x') and allows the system to visit the whole phase space, 
the algorithm provides a probability which converges to the Boltzmann distribution. 

Among the many fields of application, the Metropolis algorithm has been widely used to 
investigate the equilibrium properties of polymeric chains, in particular of protein models. 
Important results were obtained by simulations of lattice model proteins concerning their 
free energy landscape 0, ^. In the following we shall focus our attention on chain models 
to describe proteins, although our results can be applied to other fields. 

Equation ([T]) describes a tailor-made dynamics which, in principle, has nothing to do 
with the actual dynamics of a polymer. The actual dynamics of the polymer is described (if 
one wants to describe implicitly the solvent) by Langevin's Equation [2] 
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where p is the momentum of a given particle, F is the force acting on it, 7 is the friction 
coefficient, m the mass and t] a stochastic variable describing the interaction with the solvent. 
This variable satisfies < ri{t) >= and < ri{t)r]{t') >= D'-f^6{t — t'), where the brackets 
indicate the average over the realizations of r] and D is the diffusion coefficient. On the 
other hand, in implementing the Metropolis algorithm, one is free to choose any kind of 
fancy move, with the only goal of speeding up the sampling of conformational space. If 
the chosen move allows enormous jumps across the conformational space, it is clear that the 
resulting dynamics has nothing to do with the actual dynamics of the polymer. However, the 
set of moves chosen for protein models is usually quite realistic, involving mainly local moves 
of the atoms (e.g., flips, crankschafts, etc. [5|]). Thus, one can ask whether the trajectories 
obtained with the Metropolis algorithm and small moves have some degree of realism, in 
the sense that they provide an approximation to the solution of Langevin's equation. 

Both Langevin's equation and the Metropolis algorithm are stochastic, containing some 
randomness. Consequently, what makes physical sense is not a single trajectory but trajec- 
tories averaged over the randomness. Thus, in the following we will compare solutions of the 
Langevin's equation averaged over the force rj exerted by the solvent with the trajectories 
generated by the Metropolis algorithm, averaged over independent runs. 

The relation between number of Monte Carlo steps and time was investigated in the 



case of spinoidal decomposition in two dimensions by Meakin and coworkers 



lattice gauge theory by Baillie and Johston 



6] and for 



or the simple case of a lattice model 
that the Metropolis simulations are 



of a a-helical hairpin, Rey and Skolnick showed 
independent on the set of moves chosen in the Metropolis algorithm and are consistent with 
Langevin dynamics simulations. In the present work, we want to investigate further on the 
conditions under which the average solution of Langevin's equation is well approximated by 
Metropolis algorithm for protein-like chains. 



II. THE THEORY 

Following ref. [2I, one can show that, under the assumption that Pap allows only small 
transitions 6x = x' — x, the master equation ([T]) solved by the Metropolis algorithm approx- 
imates a diffusive Fokker-Plank equation, which is equivalent to Langevin's equation. In 
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fact, writing w{x, Sx) = w{x — ^ x'), one can approximate 

d 

w(x — 6x, 6x)p(x — 6x, t) ~ w(x, 6x)p(x, t) — 5x— (w(x, 6x)p(x, t)) + 

ox 

1 92 

+ -{Sxf-^iwix,Sx)p{x,t)), (4) 

so that the master equation ([1]) becomes 

dpix ^ /* f 

— — ~ / w{x,6x)p{x,t)d{6x) — / 6x—[w{x,Sx)p{x,t)]d{Sx) + 

+ - {5xY-^-^[w{x,5x)p{x,t)]d{5x) — / w{x,5x)p{x,t)d{5x). 

and, since the first and last terms cancel each other, we get 

= iMx)p{x, t)) + l^ {B{x)p{x, t)) , (5) 

where A{x) = f d{6x) 6xw{x,6x) and B{x) = J d{6x) 6x^ w{x,6x). Note that A{x) and 
B{x) are nothing else but the average displacement and the average square displacement of 
the coordinate x (in other words, < 6x > and < 6x'^ >, respectively). 

It is then enough to show that, among all possible Fokker-Plank equations, the one which 
rules the Monte Carlo sampling is that associated with diffusion in a potential U. This is true 
if A{x) = —U'{x)/'y and B{x) = 2D. Using the jumping rate of the Metropolis algorithm 

E]) it is possible to calculate the values of A{x) and B{x), using the scheme developed in ref. 

91] . To achieve this result, use is made of the hypothesis that Pap{x, 6x) = Pap{x x') allows 
only small jumps. We define the small number R as the maximum displacement allowed to 
the coordinate x in a single move, and we assume that R is independent on x (in order to 
obtain, at the end, a homogeneous diffusion coefficient). For sake of simplicity, we can chose 
Papi^x) equal to (2R)^^ if —R < 6x < R and zero otherwise. Moreover, again due to the 
small jumps hypothesis, we can expand the jumping rate in 



W{x,6x) = WoPapiSx) 



mm 



1, exp 



U{x + 5x) - U{x) 



WoPap{Sx)mm 



_ U'{x)6x 
' f 

(6) 



In general, ai 6x = the sign of U {x+6x) —U {x) changes, and the minimum function switches 
from the value 1 to the exponential. We can thus break the integral of the definition of A{x) 
at 6x = 

/oo 
5xw{x, 5x)d{Sx) = 
-00 



The first two integrals cancel each other because they can be merged into the integral of an 
odd function on an even interval. Consequently, 

A{x) = -^0^^ • (7) 

In the same way one can calculate 

/oo 
5x'^w{x, 6x)d{6x) = 
-oo 

\U'(x)\-R^ 

--0^—0^^. (8) 

Since R is small, the second term of the latter expression is negligible with respect to the 
former, so that 

Bix) = wo^ (9) 

which is a constant. This corresponds to Langevin dynamics with 7 = GT/wqR^ and D = 
WqR'^/G (these expressions will be commented in Sect. lIVp . 



III. COMPARING LANGEVIN AND METROPOLIS SIMULATIONS 

The above derivation show that the Metropolis algorithm can be used to solve Langevin's 
equation, provided that the allowed moves are small. On the other hand, it gives no indi- 
cation about how small the move must be. To investigate this point, we have compared 
the average trajectories generated by the Metropolis algorithm with numerical solutions of 
Langevin's equation. 

n 

The protein model used in the following is a simplified Go model [lOj. Go models, at 
different degrees of geometric resolution, have been widely used in the literature to investi- 
gate thermodynamical and kinetic properties of proteins. In their (reversion they allow to 



perform massive simulations with small- to medium-sized proteins [III, 112, lid, llJ] ■ Another 
choice is to account also for the side-chain, as a single bead [l^, Q , in order to give a more 
realistic description of the protein without increasing much the computational cost. In fact, 
with this model it was possible not only to simulate the folding, but also the aggregation of 
a number of identical proteins Another possible choice is to give a full description of 



the atomic structure of amino acids, where the basic interacting unit is the atom 
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19| (for 



a careful review see ref. [20|). In order to perform the simulations we have employed a 
Go-model, where each amino acid is described as a spherical bead. Two amino acids which 
are in contact (whose Cq, distance is < 6.5A in the experimental native conformation and 
which are not consecutive along the chain) interact through a 6-12 Lennard- Jones potential, 
whose bottom lies at Rq = 0.8 x (i^, and energy —2 kcal/mol and whose cutoff is at 20 A. 
The other pairs of amino acids repell each other with a potential. 

In the implementation of the Metropolis algorithm, at each step an amino acid is chosen 
at random with flat probability. The chosen amino acid is rotated of an angle Aa around the 
axis defined by the previous and the following one, where Aa is a random number generated 
from a fiat distribution and constrained in the interval — a^f < Aa < a^/. This set of moves 
varies the dihedrals and the angles associated with the amino acids, while the bonds between 
consecutive amino acids are inextensible and display length Rb = 3.84A. The first and last 
amino acid of the chain diplay a further degree of freedom, that is the bond angle with the 
previous amino acid, which can be also moved at random. 



The Langevin equations of motion are integrated with the B.B.K. algorithm |2ll] using a 
time step At = 10/s. Each amino acid is linked to the consecutive ones through a harmonic 
potential with spring constant k = lOKg/s^. This is the only difference between the model 
used for the molecular dynamics and that used for the Metropolis simulations. Due to the 
stiffness of the spring, we do not expect that this produces relevant effects. 

In order to compare Langevin and Metropolis dynamics, we have performed a number of 
simulations at temperature T = 400°K, lower than the folding temperature (T/ = 440°K), 
with both methods on a short protein (60 residues), the SH3 domain of Src, starting from 
random generated conformations. In order to evaluate the dependence of the results on the 
length of the protein, we will also study betanova, a synthetic peptide built out of 18 residues 
and the monomer of HIV-PR, a protein of 99 residues. The solid curve in Fig. [1] is the 
energy of SH3 as a function of time calculated with the Langevin dynamics, averaged over 
50 independent runs. Metropolis simulations have been performed at the same temperature, 
using different values of the threshold aM of the angular move. For each of them, we have 
calculated the average energy as a function of the number of Monte Carlo steps. To transform 
the number of Monte Carlo steps (MCs) into time, we have calculated the time content tq 
of a single Monte Carlo step (tq = 1/wq), optimizing its value by means of a least-square 
minimization of the curve obtained from the Monte Carlo simulation versus that obtained 
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from the Langevin dynamics. Three examples of the resulting curves are also displayed in 
Fig. [T]with dotted {aM = 180°), dashed (q;m = 4°) and dotted-dashed (a*/ = 0.5°) lines. 

The values of tq and of the square-root of the average of residuals p obtained from 
the optimization are displayed in Fig. [21 These plots show that in order to have in the 
out-of-equilibrium regime an error in the determination of the energy smaller than few 
kcal/mol (i.e., a typical single non-bonded interaction in biological molecules) one needs to 
set om < 1°- Larger values of om lead to errors of the order of 4-5 kcal/mol in the whole 
out-of-equilibrium phase (cf. Fig. [2]), which can reach 10 kcal/mol in the first nanoseconds 
(cf. Fig. [1]). Of course, in the long-time limit the curves overlap, by virtue of the ergodic 
theorem. 

In Fig. [3] is displayed the time dependence of the mean dRMSD and q, two order pa- 
rameters which indicate to which extent a conformation of the protein chain is similar to 
the native conformation. The dRMSD is defined as X]i<j('^«j ~ '^fj^Y^'^^ where N is 
the number of amino acids of the chain, dij is the distance between two of them in the 
current conformation and dfj is the same quantity calculated in the native conformation. 
The parameter q is the fraction of contacts that a given conformation shares with the native 
conformation, having defined two amino acids to be in contact if they are closer than 6.5A. 
These curves show the same behaviour of the energy, a good description being only provided 
for aM < 1°- 

The time content of a Monte Carlo step in the most reliable case om = 0.5° is IMCS 
= 0.1 fs, and it increases almost linearly up to IMCS = 15 fs at aM = 30°, where it 
reaches a plateau (see Fig. [2]). A time step of 0.1 fs is quite small, smaller than that 
usually employed in molecular dynamics simulations. However, this limit is compensated by 
the fact that Metropolis algorithm are often computationally much faster than molecular 
dynamics algorithm. Moreover, if one requires a less stringent description of the initial stages 
of the dynamics, it is possible to use a larger threshold aM- To be noted that the above 
relationship holds for a potential shaped with a Lennard- Jones function of the kind used in 
these calculations, and thus will be more favourable for smoother (although, possibly, less 
realistic) potentials. 

We have also compared the energy fluctuations {< > — < E >'^y/'^ obtained by the 
Monte Carlo and Langevin simulations. The results obtained for aM < 0.5° are reported 
in Fig. m^a). Although the overall behaviour is quite similar, the Monte Carlo simulations 
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understimates the fluctuations in the flrst nanoseconds of up to 3 kcal/mol (i.e., ^ 5kT). 
As the system approaches equihbrium (cf. the last tens of nanoseconds), the two curves 
overlap better, as expected by virtue of the ergodic theorem. In order to obtain a better 
overlap, it is necessary to further reduce the value of om- In Fig- IHc) are displayed the same 
quantities as above, but obtained using aM < 0.05°. Note that we could simulate only the 
flrst 1.5 ns, due to the much larger computational cost of choosing such a small elementary 
move. However, in this case the match between the fluctuations obtained with the Monte 
Carlo algorithm and the Langevin dynamics is much better (root mean square residual is 
1.1 kcal/mol) than in the case where we used «m < 0.5° on the same time interval (root 
mean square residual is 3.1 kcal/mol, cf. Fig. 111(b)). 

Summing up, one can conclude that the Metropolis dynamics is a fast algorithm to 
describe the dynamics of mean quantities, but is not useful if one requires high precision 
(< kT) in the higher moments. 

IV. DEPENDENCE ON THE TEMPERATURE 

The link between Metropolis and Langevin dynamics, provided by Eqs. [7] and [9], displays 
the unrealistic feature that the effective friction coefficient 7 results linearly dependent on 
the temperature of the system (7 = QT/wqR"^), while the diffusion coefficient D results in- 
dependent on T (D = wqB? /&). Nonetheless, these two quantities satisfy Einstein's relation 
D = T j^. The independence of 7 and the linear dependence of D on the temperature can 
be reached if one assumes that the value of associated with the Metropolis step varies 
linearly with the temperature, i.e. w^iT^ = w'q-T, where the constant w'q is independent on 
T. This provides the correct 7 = Q/w'^R'^ and D = Tw'qR'^ /Q. 

To check numerically this result, we have performed both Metropolis (using um = 0.05°) 
and Langevin simulations at various temperatures ranging from T = 200K to T = 500K. The 
values of Wq obtained from the simulations of SH3 (following the same procedure described in 
the previous Section) are displayed with filled squares in Fig. [5] as a function of temperature. 
The correlation coefficient of the linear fit Wo{T) = w'q ■ T is 0.987, indicating a good linear 
behaviour (see also Table [11). Note that the low value of w'q suggests that for proteins of the 
size of SH3 and for temperature variations within the range of biological relevance, the error 
done assuming a wq independent of the temperature is small. 
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In addition, we have also studied the displacement of the centre of mass of the protein, 
calculating the diffusion coefficient both in the case of Langevin and Metropolis simulations. 
The value of Wq obtained from the comparison of the diffusion coefficients is also displayed 
in Fig. O (empty squares) and the linear coefficient Wq of the fitting line is listed in Table [B 
One would have expected Wq to be identical to Wq. Their difference suggests that the small 
approximation done in each Metropolis step with respect to the Langevin dynamics sums 
up to displace the centre of mass of the protein, giving rise to a less precise approximation. 

V. DEPENDENCE ON THE LENGTH OF THE PROTEIN 

The results discussed above are obtained with a protein composed of 60 amino acids. In 
order to elucidate to which extent these results depend on the length of the protein, we 
have repeated the same kind of molecular dynamics and Monte Carlo simulations for other 
two proteins, that is betanova (A^ = 18) and the monomer of HIV-1 protease (A^ = 99). 

The mean square energy deviation between the two kind of simulations as a function of 
aM are plotted in Fig. [6] with a solid curve for HIV-1 Protease and with a dashed curve for 
betanova. While HIV-1 protease behaves similarly to SH3 (cf. Fig. [2]), betanova displays 
low deviations even for large angles aM- In any case, at low values of aM, all of them display 
small values of p. For example, at aM = 1° the values of p are 2.34 for HIV-1 protease (i.e., 

= 99), 2.47 for SH3 (A^ = 60) and 0.62 for betanova (A^ = 18). Consequently, a threshold 
of aM = 1° is a safe choice for most cases of interest. 

The variation of tq (= 1/wq) with respect to A^ is displayed in the inset to Fig. [61 Again, 
the shortest protein displays a behaviour which is quite different from the other two, having 
a To which is an order of magnitude larger. On the other hand, the two larger proteins 
display values of tq of 0.2 and 0.4fs, respectively, that is of the same order of magnitude. 

The dependence of wo on the temperature for the different protein sizes is displayed in 
Fig. [5] and the coefficients Wq of the linear fit summarizd in Table [B These results show 
that Wq increases quite rapidly as the length of the protein increases. Thus one can conclude 
that in Metropolis simulations of proteins larger than SH3 it becomes necessary to account 
explicitely for the dependence of Wq on the temperature. 
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VI. CONCLUSIONS 



We have shown that Metropohs algorithm can be used to simulate the dynamics of a 
simplified protein model, provided that the residue are moved of small dihedrals and that 
the probability of chosing a move is independent of the conformation of the chain. If one 
is investigating mean quantities, a constrain of 1° on the dihedral move is enough. This 
approach can be useful not only because it gives a physical meaning to the trajectories 
obtained by Monte Carlo thermodynamical samplings, but also because for simple protein 
models the Metropolis algorithm can be faster than Langevin's dynamics. For example, a 
1-ns simulation of the model chain at T = 400 and with ccm = 1° took, with our codes and 
on the same PC, 8.5 s when performed with Metropolis algorithm and 16.0 s when performed 
with Langevin's algorithm. Of course, the possibihty to use Metropohs simulations to study 
the dynamics of a protein implies the knowledge of the conversion factor Tq. If one is 
interested in the precise value of Tq, some molecular dynamics runs are necessary in order to 
estimate it, and consequently the computational advantage of using Metropolis simulations 
is lost. If, on the other hand, one needs only the order of magnitudes of the folding time, 
one can use the conversion factors obtained above (i.e., Tq of the order of 0.5 fs at room 
temperature) . 
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protein 


length 




r 


^0 


r 


betanova 


18 


0.033 


0.601 


0.369 


0.994 


SH3 


60 


0.336 


0.987 


0.673 


0.968 


HIV-PR 


99 


1.451 


0.998 


2.451 


0.984 



Table I: The linear coefficient Wq (in (fs ■ K)~^) which controls the dependence of wq on the tem- 
perature for the three proteins, characterized by different lengths (third column) and the associated 
correlation coefficient r (fourth column). The value of Wq obtained comparing the displacement of 
the centre of mass is also listed in the fifth and sixth column. 
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Figure 1: The mean energy as a function of time for the Langevin simulation (continuous curve) 
and for three Monte Carlo simulations with different values of the maximum allowed dihedral angle 
move: om = 0.5 (dot dashed blue curve), = 4 (dashed green curve), om = 180 (dotted red 
curve). The inset shows a zoom in of the first 10 ns. 
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Figure 2: (a) The time content of a Monte Carlo step tq (= 1/wq) calculated by means of a least- 
square minimization of the curve of the energy (black dots), of the dRMSD (blue cross) and of 
the parameter q (red squares), (b) the value of the square-root of the average of residuals p as a 
function of the maximum allowed dihedral angle move used in Monte Carlo simulation. 
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Figure 3: Same as Fig. [T] but for the mean value of the dRMSD and of the parameter q. 
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Figure 4: (a) The energy fluctuations obtained by the Monte Carlo with qm = 0.5 (continuous 
curve) and Langevin simulations (dotted curve), (b) A zoom of (a), (c) The energy fluctuations 
obtained setting um = 0.05. 
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Figure 5: The values of wq obtained from the fit of the energy as a function of temperature for 
HIV-PR (filled triangles), SH3 (filled squares) and betanova (filled circles). The open symbols 
indicate the values of wq obtained by the fit of the displacement of the centre of mass. The straight 
lines indicate the linear fit. 
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Figure 6: The square-root of the average of residuals p as a function of the maximum allowed 
dihedral angle move used in Monte Carlo simulation for the HIV-1 protease (solid curve) and 
betanova (dashed curve). In the inset, the value of tq at aM = 1 for the three proteins, as a 
function of their length N. 
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